December 17, 2020
The COVID-19 data from both the John Hopkins and New York Times repositories are pulled and used to calculate the rate of new reported cases for each country and the rates of new reported cases and deaths for each U.S. state and county. These rates are used to generate a predictive regression model for each locale. A risk prediction (ρ) is generated from these models, and the countries, states, and counties with the highest predicted risk are compared in the charts in this document. In the U.S. case-death charts, a generalized additive model (GAM) smoothing function is fit to each data set to make it easier to visualize trends.
The risk assessment methodology used in this analysis has not been fully validated and is affected by noise in the data. There is a phenomenon that has been reported in White House press briefings in which some counties report updates on Mondays for the incremental changes over the weekend. Cyclical weekly variation can be observed in the data. This limits the accuracy of the model predictions. To increase prediction robustness, the model has been tuned to use data over a multi-day period as a compromise between the speed of the detection of a relevant changes in risk predictions and prediction error caused by sensitivity to noise.
The predictive analytics model is built with the open-source R programming language using the Tidyverse family of packages.
There are 191 countries represented in the Johns Hopkins University data set. The Gross Domestic Product (GDP) data shown above represents per capita GDP at purchasing power parity (PPP) in international (Geary-Khamis) dollars. These data are obtained from the Countries by GDP (PPP) per capita (Wikipedia) web page. Only countries with a risk prediction value above 25 are shown.
There have been 17,017,946 total COVID-19 cases (245,033 new cases per day) and 307,642 deaths (3,611 new deaths per day) in the United States from January 21, 2020 to December 16, 2020.
14 states currently have risk predictions above 25.
There are 3,221 U.S. counties represented in the New York Times data set.
For the purpose of assisting the global COVID-19 pandemic response, Google has made available detailed mobility estimates relative to local baselines obtained from mobile phone and other data of the type used by traffic, etc., services like Google Maps and Waze. The data are provided by Google in the form of Community Mobility Reports.
As global communities respond to COVID-19, we’ve heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps could be helpful as they make critical decisions to combat COVID-19.
These Community Mobility Reports aim to provide insights into what has changed in response to policies aimed at combating COVID-19. The reports chart movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.
The data used for the analysis below is current through December 13, 2020.
Note: The dotted grey line on each of the mobility charts represents the date (March 13, 2020) on which the U.S. declared a National Emergency Concerning the Novel Coronavirus Disease (COVID-19) Outbreak.
On July 28, 2020, the New York Times released estimates of face mask usage by county calculated from nationwide responses to the survey question “How often do you wear a mask in public when you expect to be within six feet of another person?”. The data was collected from July 2 to July 14, 2020.
This data comes from a large number of interviews conducted online by the global data and survey firm Dynata at the request of The New York Times. The firm asked a question about mask use to obtain 250,000 survey responses between July 2 and July 14, enough data to provide estimates more detailed than the state level. (Several states have imposed new mask requirements since the completion of these interviews.)
An aggregate score was computed from the New York Times data for each U.S. county using a weighted average. State aggregate scores were then calculated using the mean county scores for each state.
The chart below shows predicted risk based on analysis of the state case data compared with face mask usage for all states with moderate-to-high risk predictions (greater than 5) on July 30, 2020. The intent here is not to find any causal relationship. Some states have high mask usage because of high numbers of confirmed cases locally, and some states may have low local case numbers because of relative high mask usage. The data may indicate, however, some level of additional risk for states with high predicted risk based on case data and low mask usage numbers (as of July 30, 2020). Oklahoma and Missouri stand out in this regard, although it is reasonable to expect that mask usage will increase in response to rising cases. (States with predicted risk greater than 25 and mask usage less than 50% are shown in yellow.)
Analysis of the New York Times reported death data for the U.S. reveals a repeating weekly pattern in which the updates on Sunday and Monday are consistently lower than those reported on the other days of the week. As mentioned in the data analysis description in the Background section, the risk prediction algorithm has been configured to reduce the effect of this variation on the statistical model.